AITopics | mean imputation

Collaborating Authors

mean imputation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BRITS: Bidirectional Recurrent Imputation for Time Series

Wei Cao, Dong Wang, Jian Li, Hao Zhou, Lei Li, Yitan Li

Neural Information Processing SystemsFeb-13-2026, 05:56:29 GMT

Our proposed method directly learns the missing values ina bidirectional recurrent dynamical system, without anyspecific assumption. The imputed values are treated as variables of RNN graph and can be effectively updated during backpropagation. BRITS hasthree advantages: (a)itcanhandle multiple correlated missing values intime series; (b) itgeneralizes totime series with nonlinear dynamics underlying; (c) it provides a data-driven imputation procedure and applies to general settings with missing data. We evaluate our model on three real-world datasets, including an air quality dataset, a healthcare dataset, and a localization dataset for human activity. Experiments show that our model outperforms the state-of-the-art methods in both imputation and classification/regression.

artificial intelligence, imputation, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > China > Beijing > Beijing (0.04)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback

08fa43588c2571ade19bc0fa5936e028-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-7-2026, 09:46:03 GMT

attribution, generator, subset, (12 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.73)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.31)

Add feedback

Imputation Uncertainty in Interpretable Machine Learning Methods

Golchian, Pegah, Wright, Marvin N.

arXiv.org Machine LearningDec-22-2025

In real data, missing values occur frequently, which affects the interpretation with interpretable machine learning (IML) methods. Recent work considers bias and shows that model explanations may differ between imputation methods, while ignoring additional imputation uncertainty and its influence on variance and confidence intervals. We therefore compare the effects of different imputation methods on the confidence interval coverage probabilities of the IML methods permutation feature importance, partial dependence plots and Shapley values. We show that single imputation leads to underestimation of variance and that, in most cases, only multiple imputation is close to nominal coverage.

imputation, imputation uncertainty, mean imputation, (14 more...)

arXiv.org Machine Learning

2512.17689

Country:

Europe > Germany > Bremen > Bremen (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)

Add feedback

08fa43588c2571ade19bc0fa5936e028-AuthorFeedback.pdf

Neural Information Processing SystemsOct-1-2025, 23:42:21 GMT

artificial intelligence, generator, machine learning, (14 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.73)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.31)

Add feedback

Conformal Prediction with Cellwise Outliers: A Detect-then-Impute Approach

Peng, Qian, Bao, Yajie, Ren, Haojie, Wang, Zhaojun, Zou, Changliang

arXiv.org Machine LearningMay-9-2025

Conformal prediction is a powerful tool for constructing prediction intervals for black-box models, providing a finite sample coverage guarantee for exchangeable data. However, this exchangeability is compromised when some entries of the test feature are contaminated, such as in the case of cellwise outliers. To address this issue, this paper introduces a novel framework called detect-then-impute conformal prediction. This framework first employs an outlier detection procedure on the test feature and then utilizes an imputation method to fill in those cells identified as outliers. To quantify the uncertainty in the processed test feature, we adaptively apply the detection and imputation procedures to the calibration set, thereby constructing exchangeable features for the conformal prediction interval of the test label. We develop two practical algorithms, PDI-CP and JDI-CP, and provide a distribution-free coverage analysis under some commonly used detection and imputation procedures. Notably, JDI-CP achieves a finite sample $1-2α$ coverage guarantee. Numerical experiments on both synthetic and real datasets demonstrate that our proposed algorithms exhibit robust coverage properties and comparable efficiency to the oracle baseline.

artificial intelligence, data mining, machine learning, (13 more...)

arXiv.org Machine Learning

2505.04986

Country:

Asia > China > Shanghai > Shanghai (0.04)
South America > Brazil (0.04)
North America > United States > California > Orange County > Irvine (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

To impute or not to impute: How machine learning modelers treat missing data

Chen, Wanyi, Cummings, Mary

arXiv.org Artificial IntelligenceMar-20-2025

Missing data is prevalent in tabular machine learning (ML) models, and different missing data treatment methods can significantly affect ML model training results. However, little is known about how ML researchers and engineers choose missing data treatment methods and what factors affect their choices. To this end, we conducted a survey of 70 ML researchers and engineers. Our results revealed that most participants were not making informed decisions regarding missing data treatment, which could significantly affect the validity of the ML models trained by these researchers. We advocate for better education on missing data, more standardized missing data reporting, and better missing data analysis tools.

artificial intelligence, data quality, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2503.16644

Country:

Oceania > New Zealand (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Alaska (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Fast Iterative and Task-Specific Imputation with Online Learning

Bordoloi, Rahul, Réda, Clémence, Bej, Saptarshi

arXiv.org Artificial IntelligenceJan-23-2025

Missing feature values are a significant hurdle for downstream machine-learning tasks such as classification and regression. However, they are pervasive in multiple real-life use cases, for instance, in drug discovery research. Moreover, imputation methods might be time-consuming and offer few guarantees on the imputation quality, especially for not-missing-at-random mechanisms. We propose an imputation approach named F3I based on the iterative improvement of a K-nearest neighbor imputation that learns the weights for each neighbor of a data point, optimizing for the most likely distribution of points over data points. This algorithm can also be jointly trained with a downstream task on the imputed values. We provide a theoretical analysis of the imputation quality by F3I for several types of missing mechanisms. We also demonstrate the performance of F3I on both synthetic data sets and real-life drug repurposing and handwritten-digit recognition data.

artificial intelligence, imputation, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2501.13786

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany (0.04)
Asia > India > Kerala > Thiruvananthapuram (0.04)

Genre: Research Report > New Finding (0.92)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.87)
Education > Educational Setting > Online (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.45)

Add feedback

Exploiting the Data Gap: Utilizing Non-ignorable Missingness to Manipulate Model Learning

Koyuncu, Deniz, Gittens, Alex, Yener, Bülent, Yung, Moti

arXiv.org Artificial IntelligenceSep-6-2024

Missing data is commonly encountered in practice, and when the missingness is non-ignorable, effective remediation depends on knowledge of the missingness mechanism. Learning the underlying missingness mechanism from the data is not possible in general, so adversaries can exploit this fact by maliciously engineering non-ignorable missingness mechanisms. Such Adversarial Missingness (AM) attacks have only recently been motivated and introduced, and then successfully tailored to mislead causal structure learning algorithms into hiding specific cause-and-effect relationships. However, existing AM attacks assume the modeler (victim) uses full-information maximum likelihood methods to handle the missing data, and are of limited applicability when the modeler uses different remediation strategies. In this work we focus on associational learning in the context of AM attacks. We consider (i) complete case analysis, (ii) mean imputation, and (iii) regression-based imputation as alternative strategies used by the modeler. Instead of combinatorially searching for missing entries, we propose a novel probabilistic approximation by deriving the asymptotic forms of these methods used for handling the missing entries. We then formulate the learning of the adversarial missingness mechanism as a bi-level optimization problem. Experiments on generalized linear models show that AM attacks can be used to change the p-values of features from significant to insignificant in real datasets, such as the California-housing dataset, while using relatively moderate amounts of missingness (<20%). Additionally, we assess the robustness of our attacks against defense strategies based on data valuation.

imputation, mechanism, missingness mechanism, (16 more...)

arXiv.org Artificial Intelligence

2409.04407

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New York > Rensselaer County > Troy (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.89)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
(3 more...)

Add feedback

Explainability of Machine Learning Models under Missing Data

Vo, Tuan L., Nguyen, Thu, Hammer, Hugo L., Riegler, Michael A., Halvorsen, Pal

arXiv.org Artificial IntelligenceJun-29-2024

Missing data is a prevalent issue that can significantly impair model performance and interpretability. This paper briefly summarizes the development of the field of missing data with respect to Explainable Artificial Intelligence and experimentally investigates the effects of various imputation methods on the calculation of Shapley values, a popular technique for interpreting complex machine learning models. We compare different imputation strategies and assess their impact on feature importance and interaction as determined by Shapley values. Moreover, we also theoretically analyze the effects of missing values on Shapley values. Importantly, our findings reveal that the choice of imputation method can introduce biases that could lead to changes in the Shapley values, thereby affecting the interpretability of the model. Moreover, and that a lower test prediction mean square error (MSE) may not imply a lower MSE in Shapley values and vice versa. Also, while Xgboost is a method that could handle missing data directly, using Xgboost directly on missing data can seriously affect interpretability compared to imputing the data before training Xgboost. This study provides a comprehensive evaluation of imputation methods in the context of model interpretation, offering practical guidance for selecting appropriate techniques based on dataset characteristics and analysis objectives. The results underscore the importance of considering imputation effects to ensure robust and reliable insights from machine learning models.

imputation, imputation method, shapley value, (13 more...)

arXiv.org Artificial Intelligence

2407.00411

Country:

Europe > Norway > Eastern Norway > Oslo (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Privacy Preserving Data Imputation via Multi-party Computation for Medical Applications

Jentsch, Julia, Ünal, Ali Burak, Mağara, Şeyma Selcan, Akgün, Mete

arXiv.org Artificial IntelligenceMay-29-2024

Handling missing data is crucial in machine learning, but many datasets contain gaps due to errors or non-response. Unlike traditional methods such as listwise deletion, which are simple but inadequate, the literature offers more sophisticated and effective methods, thereby improving sample size and accuracy. However, these methods require accessing the whole dataset, which contradicts the privacy regulations when the data is distributed among multiple sources. Especially in the medical and healthcare domain, such access reveals sensitive information about patients. This study addresses privacy-preserving imputation methods for sensitive data using secure multi-party computation, enabling secure computations without revealing any party's sensitive information. In this study, we realized the mean, median, regression, and kNN imputation methods in a privacy-preserving way. We specifically target the medical and healthcare domains considering the significance of protection of the patient data, showcasing our methods on a diabetes dataset. Experiments on the diabetes dataset validated the correctness of our privacy-preserving imputation methods, yielding the largest error around $3 \times 10^{-3}$, closely matching plaintext methods. We also analyzed the scalability of our methods to varying numbers of samples, showing their applicability to real-world healthcare problems. Our analysis demonstrated that all our methods scale linearly with the number of samples. Except for kNN, the runtime of all our methods indicates that they can be utilized for large datasets.

imputation, imputation method, privacy preserving data imputation, (10 more...)

arXiv.org Artificial Intelligence

2405.18878

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.16)
North America > United States > Maryland > Baltimore (0.04)

Genre: Research Report > New Finding (0.88)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.88)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback